Fast Lightweight Suffix Array Construction and Checking

نویسندگان

  • Stefan Burkhardt
  • Juha Kärkkäinen
چکیده

We describe an algorithm that, for any v ∈ [2, n], constructs the suffix array of a string of length n in O(vn+ n logn) time using O(v + n/√v) space in addition to the input (the string) and the output (the suffix array). By setting v = log n, we obtain an O(n logn) time algorithm using O(n/√logn) extra space. This solves the open problem stated by Manzini and Ferragina [ESA ’02] of whether there exists a lightweight (sublinear extra space) O(n logn) time algorithm. The key idea of the algorithm is to first sort a sample of suffixes chosen using mathematical constructs called difference covers. The algorithm is not only lightweight but also fast in practice as demonstrated by experiments. Additionally, we describe fast and lightweight suffix array checkers, i.e., algorithms that check the correctness of a suffix array.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast and Lightweight LCP-Array Construction Algorithms

The suffix tree is a very important data structure in string processing, but it suffers from a huge space consumption. In large-scale applications, compressed suffix trees (CSTs) are therefore used instead. A CST consists of three (compressed) components: the suffix array, the LCP-array, and data structures for simulating navigational operations on the suffix tree. The LCP-array stores the leng...

متن کامل

Lightweight LCP-Array Construction in Linear Time

The suffix tree is a very important data structure in string processing, but it suffers from a huge space consumption. In large-scale applications, compressed suffix trees (CSTs) are therefore used instead. A CST consists of three (compressed) components: the suffix array, the LCP-array, and data structures for simulating navigational operations on the suffix tree. The LCP-array stores the leng...

متن کامل

Lightweight Parameterized Suffix Array Construction

We present a first algorithm for direct construction of parameterized suffix arrays and parameterized longest common prefix arrays for non-binary strings. Experimental results show that our algorithm is much faster than näıve methods.

متن کامل

An Incomplex Algorithm for Fast Suffix Array Construction

Our aim is to provide full text indexing data structures and algorithms for universal usage in text indexing. We present a practical algorithm for suffix array construction. The fundamental algorithm is less complex than other construction algorithms. We achieve very fast construction times for common strings as well as for worst case strings by enhancing our basic algorithms with further techn...

متن کامل

Engineering a Lightweight External Memory Suffix Array Construction Algorithm

We describe an external memory su x array construction algorithm based on constructing su x arrays for blocks of text and merging them into the full su x array. The basic idea goes back over 20 years and there has been a couple of later improvements, but we describe several further improvements that make the algorithm much faster. In particular, we reduce the I/O volume of the algorithm by a fa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003